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We perform a probabilistic analysis of onion routing. The analysis is presented in a black-box model of 
anonymous communication in the Universally Composable framework that abstracts the essential properties 
of onion routing in the presence of an active adversary that controls a portion of the network and knows all 
a priori distributions on user choices of destination. Our results quantify how much the adversary can gain 
in identifying users by exploiting knowledge of their probabilistic behavior. In particular, we show that, in 
the limit as the network gets large, a user «'s anonymity is worst either when the other users always choose 
the destination u is least likely to visit or when the other users always choose the destination u chooses. This 
worst-case anonymity with an adversary that controls a fraction b of the routers is shown to be comparable 
to the best-case anonymity against an adversary that controls a fraction Vb. 

Categories and Subject Descriptors: C.2.0 [Computer-Communication Networks]: General — security 
and protection; C.2.4 [Computer-Communication Networks]: Distributed Systems — Distributed appli- 
cations; K.4.1 [Computers and Society]: Public Policy Issues — privacy; G.3 [Probability and Statistics]: 

probabilistic algorithms 

General Terms: Security, Theory 
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1. INTRODUCTION 

Eve ry day, half a million people use the onion-routing network 



Tor I Dingledine et al. 2004 1 to anonymize their Internet communication. However, the 
effectiveness of this service, and of onion routing in general, is not well understood. 
The approach we take to this problem is to model onion routing formally all the way 
from the protocol details to the behavior of the users. We then analyze the resulting 
system and quantify the anonymity it provides. Key features of our model include i) a 
black-box abstraction in the Universally Composable (UC) framework [Canetti 2000j 
that hides the underlying operation of the protocol and ii) probabilistic user behavior 
and protocol operation. 

Systems for communication anonymity generally have at most one of two desirable 
properties: provable security and practicality. Systems that one can prove secure have 
used assumptions that make them impractical for most communication applications. 
Practical systems are ultimately the ones we must care about, because they are the 
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ones that will actually be used. However, their security properties have not been rig- 
orously analyzed or even fully stated. This is no surprise, because practical anonymity 
systems have been deployed and available to study for perhaps a decade, while prac- 
tical systems for communications confidentiality and/or authenticity have been in use 
almost as long as there have been electronic communications. It often takes a while for 
theory and practice to catch up to each other. 

Of the many anony mous-communicati on d esi gn propos als (e.g. 
llChaum 1981]: [Chaum 1988[ Jleiter and Rubin 1998t IBeimel and D olev 20031 



fimbiar and Wright 2006] |Corrigan-Gibbs and Ford 20101), onion routing 
oldschlag et al. 19961 h as hadnotable success in practice. S everal implementations 
ve been made IjGoldsch lag et al. 1996} [Sjryerson et al. 20001 [Di ngledi ne ^t al. 20041 , 
and there was a similar commerc ial system. Fr eedom [Goldberg and Shostack 2001 1. 



As of October 2011, Tor [[Dingledine et al. 2004 1, the most recent iteration of the basic 



design, consists of about 3000 routers, provides a total bandwidth of over 1000 MB/s, 



and has an estimated total user population of about 500,000 [Loesing et al. 2011 1 



Because of this popularity, we believe it is important to improve our understanding of 
the protocol. 

Onion routing is a practical anonymity-network scheme with relatively low overhead 
and latency. Users use a dedicated set of onion routers to forward their traffic, obscur- 
ing the relationship between themselves and their destinations. To communicate with 
a destination, a user selects a sequence of onion routers and constructs a circuit, or per- 
sistant connection, over that sequence. Messages to and from the destination are sent 
over the circuit. Onion routing provides two-way, connection-based communication and 
does not require that the destination participate in the anonymity-network protocol. 
These features make it useful for anonymizing much of the communication that takes 
place over the Internet today, such as web browsing, chatting, and remote login. Thus, 
formal analysis and provable anonymity results for onion routing are significant. 

As a step toward the overall goal of bridging the gap between provability and practi- 
cality in anonymous-com munication systems, we have formally modeled and analy zed 
relationship anonymity HPfitzmann and Hansen 2000 [ Shmatikov and Wang 2006) in 



Tor. Although this provides just a small part of the complete understanding of prac- 
tical anonymity at which our research program is aimed, already it yields nontrivial 
results that require delicate probabilistic analysis. We hope that this aspect of the 
work will spur the Theoretical Computer Science community to devote the same level 
of attention to the rigorous study of anonymity as it has to the rigorous study of confi- 
dentiality. 

1.1. Summary of Contributions 

Black-box abstraction: In the present paper, we treat the network simply as a "black 
box'Q to which users connect and through which they communicate with destinations. 
The abstraction captures the relevant properties of a protocol execution that the ad- 
versary can infer from his observations - namely, the observed users, the observed 
destinations, and the possible connections between the two. In this way, we abstract 
away from much of the design specific to onion routing so that our results apply both 
to onion routing and to other low-latency anonymous-communication designs. We ex- 
press the black-box model within the Universally Composable (UC) security frame- 
work MCanetti 20001 , which is a standard way to express the function and security 



^We note that our use of a "black box" is slightly different than the more common uses in the literature. 
Black-box access to some cryptographic primitives is commonly used as a starting point to achieve some 
other desired functionality. Here we show how, for purposes of anonymity analysis, we need only consider a 
black-box abstraction. 
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properties of cryptographic protocols. We tie our functionality to the guarantees of an 
actual protocol by showing it reveals as much information about users' communica- 



tion as the onion routing protocol we formalized |Feigen baum et al. 2007| | in an I/O- 
automata model. Moreover, we discuss how the functionality might be emulated by a 
protocol within the UC framework itself 

Probabilistic model: Our previous analysis in the I/O-automata model was pos- 
sibilistic, a notion of anonymity that is simply not sensitive enough. It makes no dis- 
tinction between communication that is equally likely to be from any one of a hundred 
senders and communication that came from one sender with probability .99 and from 
each of the other 99 senders with probability .000101. An adversary in the real world 
is likely to have information about which scenarios are more realistic than others. In 
particular, users' communication patterns are not totally random. When the adversary 
can determine with high probability, e.g. , the sender of a message, that sender is not 
anonymous in a meaningful way. 

Using this intuition, we include a probability measure in our black-box model. For 
any set of actual sources and destinations, there is a larger set that is consistent with 
the observations made by an adversary. The adversary can then infer conditional prob- 
abilities on this larger set using the measure. This gives the adversary probabilistic 
information about the facts we want the network to hide, such as the initiator of a 
communication. 

In the probability measure that we use, each user chooses a destination according to 
some probability distribution. We model heterogeneous user behavior by allowing this 
distribution to be different for different users. We also assume that the users choose 
their circuits by selecting the routers on it independently and at random. 

After observing the protocol, the adversary can in principle infer some distribution 
on circuit source and destination. He may not actually know the underlying probabil- 
ity measure, however. In particular, it doesn't seem likely that the adversary would 
know how every user selects destinations. In our analysis, we take a worst-case view 
and assume that the adversary knows the distributions exactly. Also, over time he 
migh t learn a good approximat ion of user behavior via the long-term intersection at- 
tack jPanezis and Serjantov 200 4 1 . In this case, it may seem as though anonymity has 
been essentially lost anyway. However, even when the adversary knows how a user 
generally behaves, the anonymity network may make it hard for him to determine 
who is responsible for any specific action, and the anonymity of a specific action is 
what we are interested in. 

Anonymity metric; We analyze relationship anonymity 

IPfitzmann and Hansen 2000[ IShmatikov and Wang 20061 in our onion routing 



model. Relationship anonymity is obtained when the adversary cannot identify the 
destination of a user. In terms of the conventional subject/action specification for 
anonymity MPfitzmann and Hansen 200011 . we can take the action to be communication 
from a given user u and the subject to be the destination. Suggested probabilistic 
metrics for a nonymity applied to thi s case include probability assigned to the correct 
destination FReiter a nd Rubin 199811. the en tropy of the destination distribution 
IDiaz et al . 2002; Serjantov and Danezis 2002||, and maximum probability within the 



destination distribution BToth et al. 2004L where the distribution in each case is a con- 
ditional distribution given the adversary's view. We will use the probability assigned 
to the correct destination as our metric. In part, this is because it is the simplest 
metric. Also, any statements about entropy and maximum probability metrics only 
make loose guarantees about the probability assigned to the actual subject, a quantity 
that clearly seems important to the individual users. 

We look at the value of this anonymity metric for a choice of destination by a user. 
Fixing a destination by just one user, say u, does not determine what the adversary 
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sees, however. The adversary's observations are also affected by the destinations cho- 
sen by the other users and the circuits chosen by everybody. Because those variables 
are chosen probabilistically under the measure we added, the anonymity metric will 
have its own distribution. Several statistics about this distribution might be interest- 
ing; in this paper, we look at its expectation. Unlike other common anonymity metrics, 
our approach lets a user judge how secure he can expect a specific communication 
activity to be and thus whether to do it or not. 
Bounds on Anonymity: 

The distribution of the anonymity metric for a given user and destination depends 
on the other users' destination distributions. If their distributions are very different 
from that of the given user, the adversary may have an easy time separating out the 
actions of the user. If they are similar, the user may more effectively hide in the crowd. 
We provide the following results on a user's anonymity and its dependence on other 
user behavior: 

(1) We sho w tha t a standard approximation to our metric provides a lower bound on 
it(Thm.[33]). 

(2) We show that the worst case for anonymity over other users' behavior is when 
every other user either always visits the destinations th e use r is otherwise least 
likely to visit or always visits his actual destination (Cor. 13. 7D . The former will be 
the worst case in most situations. 

(3) We give an asymptotic expression for our metric in the worst cases (Thm. [3l6] >. The 
limit of this expression in the most common worst case with an adversary control- 
ling a fraction b of the network is equal to the lower bound on the metric when the 
adversary controls a larger fraction ^/b of the network. This is significantly worse 
than the standard analysis suggested, and shows the importance of carefully con- 
sidering the adversary's knowledge of the system. 

(4) We consider anonymity in a more typical set of user distributions in which each 
user selects a destination from a common Zipfian distribution. Because the users 
are identical, every user hides well among the others, and we show that, as the 
user population grows, the anonymity approaches the lower bound (Thm. This 
shows you may be able to use the standard approximation with accurate results if 
you are able to make assumptions about user behavior. 



1 .2. Related Work 



Ours is not the first formalization of anonymous comm unication. Early formaliza 
tions used communicating sequentia l processes f Schneider and Sidiropoulos 1996 1 



graph theory and possible worlds [ Hugh es and Shmatikov 2004|, and epistemic 



logic gSyverson and Stubblebine 1999| [llalpern and O'Neill 2005 1. 'These works fo- 
cused primarily on formalizing the nigh-level concept of anonymity in communi- 
cation. For this reason, they applied their formalisms to toy examples or sys- 
tems that are of limited practical application and can only provide very strong 
forms of anonymity, e.g. , di ning-cryptographers networks. Also, with the exception of 
Halpern and O'Neill [2005| , they have at most a limited ability to represen t probability 



and probabilistic reasoning. We have focused in [Feigenbaum et al. [200T| on formaliz- 
ing a widely deployed a nd us ed, practical, low-latency system. 

Halpern and O'Neill II2005II give a general formulation of anonymity in systems that 
applies to our model. They describe a "runs-and-systems" framework that provides 
semantics for logical statements about systems. They then give several logical defini- 
tions for varieties of anonymity. It is straightforward to apply this framework to the 



network model and protocol that we give in I jFeigenbaum et al. 2007 1. Our possibilistic 
definitions of sender anonymity, receiver anonymity, and relationship anonymity then 
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correspond to the notion of "minimal anonymity" as defined in their paper. The other 
notions of anonymity they give are generally too strong and are not achieved in our 
model of onion routing. 

Late r formalizations of substant i al anonymous communication sys- 



tems I Camenisch and Lysyanskaya 2005 



IMauw et al. 2004[ IWikstrom 2004]| have 
not been directly based on the design of deployed systems and have focused on prov- 
ability without specific regard for applicability to an implemented or implementable 
design. Also, results in these papers are for message-based systems: each message is 
constructed to be processed as a self-contained unit by the appropriate router, typically 
using the generally available public encryption key for that router. Such systems 
typically employ mixing, changing the appearance and decoupling t he ordering o f 
input to output messages at the router to produce anonymity locally IChaum 19811 . 
Onion routing, on the other hand, is circuit based: before passing any messages with 
user content, onion routing first lays a circuit through the routers that provides 
those routers the keys to be used in processing t he actual messag es. Mixing can 
be combined with onion routing in various ways IReed et al. 199811 . although this 
is not typical [Dingledine et al. 2004| . Such circuit creation facilitates bidirectional, 
low-latency coommunication and has been an i dentifying fe ature of onion routing since 
the first public use of the phrase fGoldschlag et al. 19961. Thus, while illuminating 



and important works on anonymous communication, the formalizations above are 
not likely to be applicable to low-l atency communications, and, despite the title of 



[Camenisch and Lysyanskaya 2005 1, are not analyses of onion routing. 

In this paper, we add probabilistic analysis to the framework of 
Feigenbaum et al. [2007 1. Other works have presented probabilist i c analysis of anony- 
mous commu nication fReiter and Rubin 1998 IShm atikov 2004': Wright e t al. 2004 
Danezis 20031 jJ anezis and Serjantov 2004 Mathewson and Dingledine 2004 
Kesdogan et al. 1998] and even of onion routing |Syverson et al. 2^000| . The work 



of Shmatikov and Wang II2006II is particularly similar to ours. It calculates relation- 
ship anonymity in mix networks and incorp orates user distr ibutions for selecting 
destinations. However, with the exception of IShmatikov 200411 . these have not been 
formal analyses. Also, whether for high-latency systems such as mix networks, or 
low-latency systems, such as Crowds and onion routing, many of the attacks in 
these papers are some form of intersection attack. In an intersection attack, one 
watches repeated communication events for patterns of senders and receivers over 
time. Unless all senders are on and sending all the time (in a way not selectively 
blockable by an adversary) and/or all receivers receiving all the time, if different 
senders have different receiving partners, there will be patterns that arise and 
eventually differentiate the communication partners. It has long been recognized 
that no system design is secure against a long-term intersection attack. Several 
of these paper s set out frameworks for ma king t hat more precise. In particular, 
IIDanezis 2003L | |Danezis and Serjantov 2004| |, and [Mathewson and Dingledine 20041 
constitute a progression towards quantifying how long it takes (in practice) to reveal 
traffic patterns in realistic settings. 

We are not concerned herein with intersection attacks. We are effectively as- 
suming that the intersection attack is done. The adversary already has a cor- 
rect distribution of a user's communication partners. We are investigating the 
anonymity of a communication in which a user communicates with one of those part- 
ners in the distrib ution. This follows the anonymity analyses perfor med in much 
of the literature [Kesdogan et al. 1998| IMauw et al 2 004; Reiter an d"Rubin 19981 
[Syverson et al. 2000[ , which focus on finding the source and destination of an individ- 
ual communication. Our analysis differs in that we take into account the probabilistic 
nature of the users' behavior. 
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We expect this to have potential practical applications. For example, designs 
for shared security-alert repositories to facilitate both forensic analysis for im- 
proved security design an d quicker responses to widescale attacks have been pro- 
posed ULincoln et al. 2004| . A participant in a shared security-alert repository might 
expect to be known to communicate with it on a regular basis. Assuming reports of 
intrusions, etc., are adequately sanitized, the concern of the participant should be to 
hide when it is that updates from that participant arrive at the repository, i.e., which 
updates are likely to be from that participant as opposed to others. 



2. TECHNICAL PRELIMINARIES 
2.1. Model 

We describe our analysis of onion ro uting in term s of an ideal functionality in the 
Universal Composability framework MCanetti 200011 We use such a functionality for 
three reasons: First, it abstracts away the details that aren't relevant to anonymity, 
second, it precisely expresses the cryptographic protocol properties that are necessary 
for our analysis to apply, and third, it immediately suggests ways to perform similar 
analyses of other anonymous-communication protocols that may not strictly provide 
this functionality. 

In the onion routing protocol on which we base our model, users choose from a gener- 
ally known set of onion routers a subset that will comprise a circuit for communicating 
anonymously. Circuit construction has been done in vario us ways throug hout the his- 



tory of onion routing. In the first version of onion routing [Goldschlag et al. 1996 J , and 
other early versions IIReed et al. 1998[ [Goldberg and Shostack 200 1| , after a user se- 



lects a sequence of onion routers from a publicly-known set, the user then creates a cir- 
cuit through this sequence using an onion, a data structure effectively composed only of 
layers with nothing in the middle. There is one public-key-encrypted layer for each hop 
in the circuit, the decryption of which contains the identity of the next hop in the circuit 
(if there is one) and keying material fo r passing dat a over th e established circuit. In 
later protocols, such as used in Cebolla IIBrown 200211 and Tor [Dingledine et al. 2004 1 



the circuit is built via a telescoping protocol that extends the circuit hop-by-hop, using 
the existing circuit for each extension. For all of these, each hop only communicates 
with the routers before and after it in the sequence, and the messages are encrypted 
once for each router in the circuit so that no additional information leaks about the 
identities of the other routers or the destination of the circuit. Cryptographic tech- 
niques are used so that message forgery is countered. Some later designs returned 



to the non-intera ctive circuit construction of the original 1 0verlier and Syverson 2007 
IKate et al. 20071 . It is trivial to see that all of these fit directly within our model. 

Some versions of onion r outing, such as those tha t do iterative discovery 
of onion routers via a DHT IIFreedman and Morris 20021 IMittal and Borisov 20091 
IMcLachlan et al.~2009 l, will not fit within our model without some extensions that 
we do not pursue herein. This is because the probability of first-last router choice and 
router compromise within a circuit can no longer be assumed to be independent. Some 
anonymity protocols that do not use onion routing may n onetheless also fit within our 
model, appropriately extended. For example, in Crowds HReiter and Rubin 199811 . the 
adversary can learn from observing the first and last routers, but the connection to the 
first router does not automatically identify the source. On the other hand the desti- 
nation is always know to every router in the circuit. The probability that an observed 
circuit predecessor is the source can thus be combined with the observed destination 
and the a priori source-destination probability distribution. 

The adversary is computationally bounded, non-adaptively compromises an un- 
known subset of the onion routers, and can actively attack the protocol. The design 
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of our functionality is based on the assumption that the ways that the adversary can 
narrow down the possible mappings of users to destinations is determined by the set of 
circuits for which he controls the first router and the set of circuits for which he controls 



the last router. This assumption comes from the results of Feigenbaum et al. [2007 1, 



which we explicitly relate to our ideal functionality in Sec. 12.41 

Our ideal functionality models anonymous communication over some period of time. 
It takes as input from each user the identity of a destination. For every such connection 
between a user and destination, the functionality may reveal to the adversary identity 
of the user, the identity of the destination, or both. Revealing the user corresponds in 
onion routing to the first router in the circuit being compromised, and revealing the 
destination corresponds to the last router being compromised. We note that we include 
only information flow to the adversary in this functionality rather than try to cap- 
ture the type of communication primitive offered by onion routing because our focus is 
analyzing anonymity rather than defining a useful anonymous-communication func- 
tionality. This model is reminiscent of the general model of anonymous communication 
used by Kesdogan et al. II2002II in their analysis of an intersection attack. However, we 
do make a few assumptions that are particularly appropriate for onion routing. 

First, the functionality allows the adversary to know whether or not he has directly 
observed the user. This is valid under the assumption that the initiating client is not 
located at an onion router itself This is the case for the vast majority of circuits in Tor 
and in all significant deployments of onion routing and similar systems to date. We 
discuss this assumption further in Section [5l 

Second, we assume that every user is responsible for exactly one connection in a 
round. Certainly users can communicate with multiple destinations simultaneously 
in actual onion-routing systems. However, it seems likely that in practice most users 
have at most some small (and fixed-bound) number of active connections at any time. 
To the extent that multiple connections are only slightly more likely to be from the 
same user than if all connections were independently made and identically distributed, 
this is a reasonable approximation. This is increasingly true as the overall number of 
connections grows. To the extent that multiple connections are less likely to be from 
the same user this is a conservative assumption that gives the adversary as much 
power to break anonymity as the limited number of user circuits can provide. 

Third, the functionality omits the possibility that the adversary observes the user 
and destination but does not recognize that those observations are part of the some 
connection. This is another conservative assumption that is motivated by the exis- 
tence of timing attacks that an active adversary can use to link traffic that it sees at 



various points along its path through the network [ [Sjrverson et al. 2000[ . In a timing 
attack, the adversary observes the timing of the messages going into the onion-routing 
network and matches them to similar patterns of messages coming out of the onion- 
routing networ ks slightly late r. Such attacks ha ve been experimentally demonstrated 



1 0verlier and Syverson 2006; Bauer et al. 200711 and are easy to mount. 

Note that our model does not capture several known attacks on anonymity in 
onion routing . In particular, it does not include a ttacks exploiting resource interfer- 
ence [M urdo ch and Danezis 20051 [Murdoch 2006L heterogeneity on network latency 
[Hopper et al. 20101, c orrelated destinatio ns between rounds, and identifying patterns 



of communication [Herrmann et al. 200911 . We do not include such attacks primarily 
to focus on the most important threats to anonymity, because many of the omitted 
attacks are attacks on underlying systems rather than on the protocol (e.g., interfer- 
ence) or have limited effectiveness or are mitigated by improvements to the protocol. 
Also, we see the analysis of our simplified model as a first step in establishing rigorous 
guarantees of anonymity in increasingly realistic models. 
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Let U be the set of users with \U\ = n. Let A be the set of destinations. Let R 
be the set of onion routers. Let To r be the ideal functionahty. To i? takes the set A of 
compromised parties from the adversary at the beginning of the execution. Let b = \ An 
R\/\R\. When user u forwards his input from the environment to Tor, the functionahty 
checks to see if it is some d g A. If so, Tor sends to the adversary one of the following, 
choosing each with the probability given: 

(1) (_L, ±) with probability (1 - 

(2) (u, _L) with probability 6(1 - b) 

(3) (_L, d) with probability {l-b)b 

(4) {u, d) with probability 6^. 

To analyze the anonymity provided by the ideal functionality, we make two assump- 
tions about the inputs from the environment. First, we assume that the environment 
selects the destination of user u from a distribution over A, where we denote the 
probability that u chooses d as p^. Second, we assume that the environment sends a 
destination to each user. Note that these assumptions need not be made when showing 
that a protocol UC-emulates Tor- 

We refer to the combination of the adversary model, the assumptions about the en- 
vironment, and the ideal functionality as the black-box model. Let C be the relevant 
configuration resulting from an execution. C includes a selection of a destination by 
each user, Co : U ^ A, a set of users whose inputs are observed, Ci : U ^ {0, 1}, and 
a set of users whose outputs are observed, Co ■ U ^ {0, 1}. A user's input, output, and 
destination will be called its circuit. 

For any configuration, there is a larger set of configurations that are consistent with 
the outputs that the adversary receives from Tor- We will call two configurations in- 
distinguishable if the sets of inputs, outputs, and links between them that the adver- 
sary receives are the same. We use the notation C ~ C to indicate that configurations 
C and C are indistinguishable. 

2.2. Probabilistic Anonymity 

A user performs an action anonymously in a possibilistic sense if there is an indistin- 
guishable configuration in which the user does not perform the action. For example, 
under this definition a user with observed output but unobserved input sends that out- 
put anonymously if there exists another user with unobserved input. The probability 
measure we have added to configurations allows us to incorporate the degree of cer- 
tainty that the adversary has about the subject of an action. After making observations 
in the actual configuration, the adversary can infer a conditional probability distribu- 
tion on configurations. There are several candidates in the literature for assessing 
an anonymity metric from this distribution. The probabilistic anonymity metric that 
we use is the posterior probability of the correct subject. The lower this is, the more 
anonymous we consider the user. 

2.3. Relationship Anonymity 

We analyze the relationship anonymity of users and destinations in our model, that is, 
how well the adversary can determine if a user and destination have communicated. 
Our metric for the relationship anonymity of user u and destination d is the posterior 
probability i' that u chooses d as his destination. We study ip directly, although the 
anonymity of a user's communication with a destination is 1 — il'. 

Using the posterior probability makes sense in this context because it focuses on the 
information that users are trying to hide — their actual destinations — without being 
affected by information the adversary learns about other destinations. Onion routing 
does leak information, and using a metric such as the entropy of the posterior distribu- 
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tion or the statistical distance from the prior may not give a good idea of how well the 
adversary's can correctly guess the user's behavior. Designers may wish to know how 
well a system protects communications on average or overall. But it is also important 
for a user to be able to assess how secure he can expect a particular communication 
to be in order to decide whether to create it or not. This is the question we address. 
Moreover, the metric is relatively simple to analyze. Furthermore, to the extent that 
the user may not know how he fits in and thus wishes to know the worst risk for any 
user, that is just a lower bound on our metric. 

The relationship anonymity of u and d varies with the destination choices of the 
other users and the observations of the adversary. If, for example, us output is ob- 
served, and the inputs of all other users are observed, then the adversary knows 
us destination with probability 1. Because we want to examine the relationship 
anonymity of u conditioned only on his destination, we end up with a distribution 
on the anonymity metric. We look at the expectation of this distribution. Moreover, 
because this distribution depends on the destination distributions of all of the users, 
we continue by finding the worst-case expectation in the limit for a given user and 
destination and then examine the expectation in a more likely situation. 

2.4. Emulating the Ideal Functionality 

The anonymity analysis of the ideal functionality J-qr that we perform in Sections [3] 
and S] is meaningful to the extent that J^or captures the information that an adver- 
sary can obtain by interacting with onion-routing protocols. We justify the functional- 
ity primarily by showing that it provides the sam e information about the s ource of a 
given connection as onion-routing as formalized by |Feigenbaum et al. [2007 1. Further- 



more, towards a more standard cryptographic analysis, we describe the way in which it 
should be possible to UC-emulate Tor, although we do not provide su ch a result here. 

Relationship to I/O-automata model Feigen baum et al. [2007| | formalize onion 
routing using an I/O-automata model[ |Lynch 1996J and an idealization of the crypto- 
graphic properties of the protocol. Their analysis identifies the user states that are 
information-theoretically indistinguishable. The black-box model we provide herein is 
a valid abstraction of that formalization because, under some reasonable probability 
measures on executions, it preserves the relationship-anonymity properties. 

The I/O-automata model includes a set of users U, a set of routers R, an adversary 
A c R, and a set of destinations A, where we take the final router in the I/O-automata 
model to be the destination and assume that it is uncompromised. A configuration C in 
the I/O-automata model is a mapping from each user m e [/ to a circuit (?'", ■•.,?") e i?', 
a destination d" G A, and a circuit identifier n" e N+. An execution is a sequence of 
I/O-automaton states and actions, which must be consistent with the configuration. 

Let users in the I/O-automata model choose the other routers in their circuits uni- 
formly at random and choose the destination accordin g to user-specific distribu tions. 
Given these circuits and a set of adversary automata, |Feigenbaum et al. [2007| iden- 
tifies an equivalence class of circuit and destination choices such that, for every pair 
of configurations in the class, a bijection exists between their executions such that 
paired executions are indistinguishable. Let the indistinguishable executions thus 
paired have the same probability, conditional on their configuration. 

Given this measure, the black-box model that abstracts the I/O-automata model has 
the same user set U, the same destination set A, an adversary parameter of 6 = |y4|/|i?|, 
and the same destination distributions. The following theorem shows that each pos- 
terior distribution on the destinations of users has the same probability under both 
the I/O-automata model and its black-box model. Let E he a random I/O-automata 
execution. Let X°- be a random I/O-automata configuration {X°- can be viewed as a 
function mapping a random execution to its configuration). Let X'' be a random black- 
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box configuration. Let i{,'i{u,d,E) be the posterior probability that u visited d in the 
I/O-automata model, i.e., the conditional given that the execution is indistinguishable 
from E. Let ijj2{u,d,X^) be the posterior probability that u visited d in the black-box 
model, i.e., the conditional distribution given that the configuration is indistinguish- 
able from X''. Let 7/'o(wj d) be a distribution over destinations d for every u. 

Theorem 2.1. 

Pr\iueu.de/:^i'i{u,d,E) ^ Vo(w,c^)] = Pr\iueu,deA'^^2{u,d, X'') = ijjQ{u,d)] 

Proof. Let be the map from I/O-automata configurations to black-box configura- 
tions such that 

(1) 0(c)z)H = rf" 

1 if ri e A 



(2) <j>{C)i{u)-- 

(3) (i>{C)o{u) 



otherwise 

1 if n e ^ 

otherwise 



4) essentially "quotients out" the specific router choices of each user, retaining the com- 
promised status of the first and last routers as well as the destination. It allows us to 
relate the posterior tjji in the I/O-automata model to the V^2 in the black-box model. 

Let Ci be any I/O-automata configuration. Given any execution e of , the adver- 
sary's posterior probability on configurations is 



if C2 « Cf and otherwise, because we set equal the probability of two execu- 
ti ons that are paired with each other in the bijection on executions constructed 



m [Feigenbaum et al. [2007) . Because the configurations determine which destination 
each user visits, the distribution d, e) can be determined from the posterior dis- 
tribution on configurations. Notice that this distribution only puts positive probability 
on the set C" of configurations that are indistinguishable from C^. 

The posterior distribution on I/O-automata configurations induces a posterior distri- 
bution on black-box configurations via 4>. (j) preserves the destination of each user, and 
so the distribution ^pi{u,d,e) can be determined from this distribution on black-box 
configurations. Notice that this distribution only puts positive probability on the set of 
black-box configurations (f>{C°') that are mapped to by I/O-automata configurations in 
C. 

To understand the set ^(C") and its posterior distribution given e, consider the equiv- 
alence class C'' of the configuration (t>{Ci). Let S be those configurations in C" that differ 
fr om Cj* only in the destinat ions and the permutation of users. From Theorems 1 and 2 
in [Feigenbaum et al. [2007] |, it follows that is a bijection between S and C''. The pos- 
terior probability of each C2 € S is proportional to Pr[X'' = 0(C2 )] because the prior 
probability of is Pr[X'^ = (/)(Cf )] multiplied by the probability selecting its given 
routers (which are the same for all s e S) given that (t){X°-) ~ 4>{C2). Moreover, all of 
the other configurations in C° are reached by changing the unobserved routers of one 
of the configurations in S. is invariant under such a change. Also, the posterior prob- 
ability is invariant under such a change because the routers are chosen independently 
and uniformly at random. Furthermore, the number of I/O-automata configurations 
that are reached by such a change from some s e 5 is the same for all s. Therefore, the 
posterior probability Pr\(i)(X'') = is proportional to Pr\X^ = for e C\ and 
is zero otherwise. Therefore, ipi{u, d, e) = '4'2{u, d, ^(C")). 
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By this equality, the probabihty that a random execution E results in a given poste- 
rior ijjoiu,d) is equal to the probability that the I/O-automata configuration X"- maps 
under to a black-box configuration 4>{X°-) = such that ■4>2{u,d,C'^) = 'iJjo{u,d). 
The probability Pr[0(X°) = C''] is equal to Pr[X'> = C''] because the probability of 
first-router compromise and the probability of an input being observed are both b, last- 
router compromise and an output being observed are both independent events with 
probability b, and user destinations are chosen independently in both models and fol- 
low the same distributions. Therefore, 



Pr[\fu£U,d£A'(pi{u,d,E) = tpo{u,d)] = Pr[V„gc/,deAV'2(u, d, X) = ipo{u,d)]. 



□ 



UC -emulation Expressing our black-box model within the UC framework allows 
it to be compared to protocols expressed within the same framework. In particular, 
if a protocol can be shown to UC-emulate Tor, then, making only common crypto- 
graphic assumptions, the adversary can make only negligibly better guesses about 
users' communication when interacting with that protocol t han he can with the func- 
tionality. The results of |Camenisch and Lysyanskaya [2005) suggest that such emula- 
tion is indeed possible. An onion routing protocol similar to their protocol combined 
wit h a me ssage transmission functionality that hides messages not to corrupt parties 
(c£ ICanett i [2000]), should indeed hide the routers that are not corrupt or next to cor- 
rupt routers on a circuit. Then J-qr. provides the adversary with all the information 
about user inputs that a simulator needs in order to simulate the rest of the protocol 
and confuse the adversary. 



3. EXPECTED ANONYMITY 

Let the set C of all configurations be the sample space and X be a random configura- 
tion. Let ^' be the posterior probability of the event that u chooses d as a destination, 
that is, 4'(C) = PrlXoiu) = d\X w C]. ^' is our metric for the relationship anonymity 
of u and d. 

Let N'^ represent the set of multisets over A. Let p(A°) be the maximum number of 
orderings of A" e such that the same destination is in any given location in every 
ordering: 

p{A")^ll\{SeA'}\\ 

Let n(A, B) be the set of all injective maps A ^ B. The following theorem gives 
an exact expression for the conditional expectation of ^' in terms of the underlying 
parameters U, A, p, and b: 
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Theorem 3.1. 

E[^\Xd{u) = rf] = 6(1 - b)p^ + 

SCU:ueS AOeN^:|AO|<S 

E E ^'^^ n pk^) 

^TCS-«:|T| = |A0|-1 7ren(r+«,A0):ir(M)=d uST 

2 

E E p'^Iipii^)' 

TCS-u:|T| = |AO| iren(T,AO) i;GT 

\TC5:|T| = |A0| 7rGn(T,A0) ueT J 

Proof. At a high level, the conditional expectation of 4' can be expressed as: 

E[*|Xi5(m) = = E = <^l^c(") = rf]*(C)- 

We calculate ^' for a configuration C by finding the relative weight of indistinguish- 
able configurations in which u selects d. The adversary observes some subset of the 
circuits. If we match the users to circuits in some way that sends users with observed 
inputs to their own circuits, the result is an indistinguishable configuration. Similarly, 
we can match circuits to destinations in any way that sends circuits on which the 
output has been observed to their actual destination in C. 

The value of ^{C) is especially simple if m's input has been observed. If the output 
has not also been observed, then ^(C) = p^'. If the output has also been observed, then 
*(C) = 1. 

For the case in which us input has not been observed, we have to take into account 
the destinations of and observations on the other users. Let 5 c [/ be the set of users 
s such that C/(s) = 0. Note that u <e S. Let A° be the multiset of the destinations of 
circuits in C on which the input has not been observed, but the output has. 

Let ,fa{S, A") be the probability that in a random configuration the set of unobserved 
inputs is S and the set of observed destinations with no corresponding observed input 
is AO; 

/o(5,A") = 6"-I^W^"l(l-6fl^l-l^"l[p(A°ri Y E Epli^y 

TGS:\T\ = \AO\ 7ren(T,A0) veT 

Let /i (5, A") be the probability that in a random configuration the set of unobserved 
inputs is S, the set of observed destinations with no corresponding observed input is 
A°, the output of u is observed, and the destination of u is d: 

A") = 6"-ls|+|A"|(i _ fe)2|^^l-l^"l[p(A")]-Vd- 

E E n^^-w 

TCS-ti:|T| = |A"|-l ■iTen(T+u,A»):iT(u)=d uST 

Let /2(<S, A'') be the probability that in a random configuration the set of unobserved 
inputs is S, the set of observed destinations with no corresponding observed input is 
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A", the output of u is unobserved, and the destination of u is d: 

h{S, AO) = 6"-|S| + |A"| (1 „ f,)2|S|-|A°| [p(AO)]-lp«. 

TCS-jj:|T| = |AO| 7ren(T,A0) veT 

Now we can express the posterior probabihty ^{C) as: 

^ A(^,AO) + /.(^,A") 

The expectation of * is a sum of the above posterior probabihties weighted by their 
probabihty. The probabihty that the input of u has been observed but the output hasn't 
is 6(1 — h). The probabihty that both the input and output of u have been observed is 
5^. These cases are represented by the first two terms in Equation [TJ 

When the input of u has not been observed, we have an expression of the posterior 
in terms of sets 5* and A°. The numerator (/i + /2) of Equation [2] itself actually sums 
the weight of every configuration that is consistent with S, /SP , and the fact that the 
destination of u is d. However, we must divide by p^, because we condition on the event 
{Xd{u) = d}. 

These observations give us the final summation in Equation [H □ 
3.1 . Simple approximation of conditional expectation 

The expression for the conditional expectation of ^ in Equation [l] is difficult to inter- 
pret. I t would be nice if we c ould find a simple approximation. The probabilistic anal- 
ysis in Syverson et al. [2000[ proposes just such a simplification by reducing it to only 



two cases: i) the adversary observes the user's input and output and therefore iden- 
tifies his destination, and ii) the adversary doesn't observe these and cannot improve 
his a priori knowledge. The corresponding simplified expression for the expection is: 

E[^XD{u) = d]^h' + {l-b^)p''^. (3) 

This is a reasonable approximation if the final summation in Equation [1] is about 
(1 — b)p'^. This summation counts the case in which us input is not observed, and 
to achieve a good approximation the adversary must experience no significant advan- 
tage or disadvantage from comparing the users with unobserved inputs (5*) with the 
discovered destinations (A"). 

The quantity (1 — b)p^ does provide a lower bound on the final summation. It may 
seem obvious that considering the destinations in A" can only improve the accuracy 
of adversary's prior guess about us destination. However, in some situations the pos- 
terior probability for the correct destination may actually be smaller than the prior 
probability. This may happen, for example, when some user v, v ^ u, communicates 
with a destination e, e ^ d, and only u is a priori likely to communicate with e. If the 
adversary observes the communication to e, it may infer that it is likely that u was 
responsible and therefore didn't choose d. 

It is true, however, that in expectation this probability can only increase. Therefore 
Equation [3] provides a lower bound on the anonymity metric. 

The proof of this fact relies on the following lemma. Let £ be an event in some finite 
sample space D,. Let ^i, . . . , Ai be a set of disjoint events such that £ c IJ . At, and let 

A = ULi Let £i = £r\ A^. Finally let = l£,(a;)Pr[£.,]/Pr[A] (where l^, is 
the indicator function for £i). Y{uj) is thus the conditional probability Pr[£|A], where 
u) € £i. 

Lemma 3.2. Pr[£|A] < E[Y\£] 
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Proof. 

Pr[S] 



( 



Pr[^"]Pr[£] 



by a simple rewriting 



< -^^^^ Pr [ A^ ] Pr[£\ ~ Cauchy-Schwartz inequality 

— Pr\Ai\Pr{£.\ 

= e[y\e\ 

□ 

Theorem 3.3. e[^Xd{u) = d] > 6^ + (i - b'^)p^ 
Proof. As described in the proof of Theorem |3. 1[ 

E[^\Xd{u) =d]=b'^ + b{l - b)p'^ + (1 - b)E[^\XD{u) = dAXi{u) = 0]. 

To apply Lemma [3l2l take the set of configurations C to be the sample space O. Take 
{Xd{u) ~ d} to be the event £. Take the indistinguishability equivalence relation to 
be the sets A- Finally, take * to be Y. Then the lemma shows that E[^\Xd{u) — 
dAXi{u) = 0]>p^. □ 

3.2. Worst-case Anonymity 

To examine the accuracy of our approximation, we look at how large the final summa- 
tion in Equation [l] can get as the users' destination distributions vary. Because this 
is the only term that varies with the other user distributions, this will also provide a 
worst-case guarantee on expected anonymity metric. Our results will show that, in the 
limit as the number of users grows, the worst case can occur when the users other than 
u act as differently from u as possible by always visiting the destination u is otherwise 
least likely to visit. Less obviously, we show that the limiting maximum can also occur 
when the users other than u always visit d. This happens because it makes the adver- 
sary observe destination d often, causing him to suspect that u chose d. Our results 
also show that the worst-case expectation is about 5 + (1 - b)p]^, which is significantly 
worse than the simple approximation above. 

As the first step in finding the maximum of Equation[T]over {p^)y^u, we observe that 
it is obtained when every user v ^ u chooses only one destination d„, i.e. — 1 for 
some dy <E A. 

Lemma 3.4. A maximum of E^^lXoiu) ~ d] over {p^)y^u must occur when, for all 
V ^ u, there exists some dy e A such that p'^^ = 1. 

Proof. Take some user v ^ u and two destinations e, / e A. Assign arbitrary prob- 
abilities in p^ to all destinations except for /, and let C = 1 - J2s=jie / Ps ■ Then p"j = C-pl ■ 
Consider E[i\XDiu) = rf] as a function ofp^. The terms of Equation[l]that correspond 
to any fixed S and A" are of the following general form, where c^, c|, Cg, cl, c|, Cg > 0: 

clPl + cUC-pl)+cl 

This is a convex function of p^: 

r^2 . ^ 2(4(c| - cp + 4(clC + 4) - cl(4C + 4)? > 
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The leading two terms of El'^lXniu) = d] are constant in p"", and the sum of con- 
vex functions is a convex function, so = d] is convex in p^. Therefore, a 
maximum oi E[i!\XD{u) ~ d] must occur when e {0, 1}. □ 

Order the destinations d = di,. . . , d\A\ such that p^. > p^.^^ for i > 1. The following 
lemma shows that we can further restrict ourselves to distribution vectors in which, 
for every user except u, the user either always chooses d or always chooses (i|A|- 

Lemma 3.5. A maximum of E[ii\X]:){u) ~ d] must occur when, for all users v, either 

Proof. Assume, following Lemma [3. 4[ that {p'") v^u is an extreme point of the set 
of possible distribution vectors. 

Equation[T]groups configurations first by the set S with unobserved inputs and sec- 
ond by the observed destinations A°. Instead, group configurations first by S and sec- 
ond by the set T c S with observed outputs. Because every user except u chooses a 
destination deterministically, \1/ only depends on the sets S and T. Let 5*1 (5*, T) be this 
value. 

E[<S'\XDiu)^d]= b{l-b)p:f,+b^+ 

Select two destinations di,dj,l < i < j. We break up the sum in Equation |4] and 
show that, for every piece, the sum can only be increased by changing {p^)v so that any 
user that always chooses di always chooses dj instead. 

Fix S C U such that u e S. Let S^, Sj C S* be such that j?^^ = 1 if and only if s e S^, 
and p^^ = 1 if and only if s e Sj. Fix T' C S\S.,\Sj and some t > |r'|. 

Let f{S, T') be the sum of terms in Equation |4] that are indexed by S and some T 
such that \T\ = t and T D T'. To calculate f{S,T'), group its terms by the number td^ 
of users d in T such that Xd{v) — di. Let te be the number for these terms of users v in 
T' such that Xd{v) = e, e e A\{di,dj}. The number td^ of users v such that Xd{v) ~ dj 
for these terms is then t — J2eeA-d - ^e- Let Sg be the number of users v in S — u such 
that Xd{v) = e. The number of terms in f{S, T') with a given td^ is then 




For each of these terms, 5'i is the same. To calculate it, let fs be the number of con- 
figurations that yield the given S and (te)eeA and are such that us output is observed 
with destination 5: 




and let /o be the number of configurations that yield the same S and (te)e6A and are 
such that us output is unobserved: 

Then the posterior probability given S and (te)eeA is 

pg(/dfa.) + /ofaJ) 

Y.s<,AP'sfs{td,) + h{td^y 
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Therefore, letting m = t- J2eeA\{di4,} 

The binomial coefficients of fs and /o in the numerator and denominator largely 
cancel, and the whole expression can be simplified to 



m 

/(5,r')=a ^ 
t 



^0 y^diJ \m-tdj I PdXsd, +l){sdj +l-m + tdj+ 

(sd. + 1 - td,){sd^ + 1 - m + <dJ/3 ^ 
for some a, /3 > 0. 

This can be seen as the weighted convolution of binomial coefficients. Unfortunately, 
there is no obvious way to simplify the expression any further to find the maximum as 
we trade off s^. and . There is a closed-form sum if the coefficient of the binomial 
product is a fixed-degree polynomial, however. Looking at the coefficient, we can see 
that it is concave. 

{sdi+'i--tdi)(sd - +l-m+td. ) 



PSi(«d,+l)(sdj+l-m+tdi)+P3^(sd,.+l)(sdi+l-t<ii) + (sdi+l-tcii)(sdj+l-m+t<ii) 



2((,Sd, + l){sd, + 1)(2 + Sd, + Sd, - mfplpl+ \ 
^ , b{{sd^ + l){sd, + 1 + trf, ~ mfpl + [sd, + l){sd„ + 1 - trfJX.)) ) 

*d, ((Sdj +l+tdi ~m)(b{sd. +l-td, )+p^^ (sd, +l)) + (sdj +l)(sd, +l-tdi 

< 0. 

We can use this fact to bound the sum above by replacing with a line tangent at 

some point jq. Call this approximation /. Holding Sd^ +Sdj constant, this approximation 
is in fact equal at Sd^ ~ because the sum has only one term. Then, if s^, = still 
maximizes the sum, the theorem is proved. Let — Dt^^ct^, 

/(^.n<f(::;)(„.\)«('. 

Sd^+Sd^ ( , m-Sd^ 



Sdi + Sd, 



f{S, T'). 



The linear approximation will be done around the point io = m ■ SdJ{sd, + Sd^)- This 
results in a simple form for the resulting approximation, and also the mass of the 
product of binomial coefficients concentrates around this point. Set v = Sdi + Sd, to 
examine the tradeoff between s^, and Sd^ . 

A*.^') = (:) {"-^) 

{{i^ - Sd,){i' - m) + v){{sd^ + -m - Sd^) 



Pd.'^isd, + - Sd.){i^ - m) + v)+ 

"^.viv - Sd, + \){y + Sd,{v - m))+ 
f3{{sd, + 1)1^ - m ■ Sdi){{v - SdJ(i^ ~m) + ly) 
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Lemma [A. II in the Appendix shows that / is convex in Sd^. Thus, the maximum of / 
must exist at Sd^ = or sd, = v- Observe that when s^. = 0, 

\ — m -\- V 



m) pdj {I + ly) + — m + + pd,{l - m + u) 



and when Sd. = v 

v\ 1 — m + v 



f 



m 



Pdj (1 - "7, + j^) + - m + + pd, (1 + i^) ' 



Therefore, because pd, > Pdj, f is larger when Sd^ = 0. As stated, this imphes that / 
itself is maximized when Sd^ = 0. 

□ 

Therefore, in looking for a maximum we can assume that every user except u either 
always visits d or always visits (i|A|- To examine how anonymity varies with the num- 
ber of users in each category, we derive an asymptotic estimate for large n. A focus 
on large n is reasonable because anonymity networks, and onion routing in particular, 
are understood to have the best chance at providing anonymity when they have many 
users. Furthermore, Tor is currently used by an estimated 500,000 people. 

Let a = {v ^ u : p^ = l}/{n - I) he the fraction of users that always visit d. The- 
orem 13.61 gives an asymptotic estimate for the expected posterior probability given a 
constant a. It shows that, in the limit, the maximum expected posterior probability is 
obtained when all users but u always visit d or when they always visit d\A\ ■ 

Theorem 3.6. Assume that, for all v ^ u, either p"^ = 1 orp^ =1. Then, if a = 0, 



ErnXoiu) = d\ = b{l - b)p:i + b' + il-b)\^b+ 
ifO < a <1 



E\<^\Xu{u) = d\ = 6(1 - b)pl + b' + {l- 5)^-^^^^-^ + O ' • 
and, if a = 1, 



E[^\XUu) = d]= b{l b)p^d +b' + ii- ^) Y^f^ + o i^f^j ■ 

Proof. Let rig = — 1) andn/ = (1 — Q;)(n— 1). The expected posterior probability 
can be given in the following variation on Equation |4) 

Ei^lXoiu) ^d] = b{l - b)p'^ + 62 + (1 - 6)- 

5:("M(l-6r6--^(7)(l-6)/6'v-/. 
/=o ^ 

} 

5:n6'(i-6)/-tQ^'^'(i-^) 



e—k 



[6*2(e, /, J, fc + 1) + (1 - b)^2ie, /, J, k)] 
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Here ^'2(6, k) is the value of 4' when the users with unobserved inputs consist of 
u, e users v ^ u with = 1, and / users v ^ u with p'^^^^ = 1; and the users with 

unobserved inputs and observed outputs consist of k users v with Xoiv) = d and j 
users 1; with Xoiv) = d\^\. Given such a configuration, the number of indistinguish- 
able configurations in which u has observed destination d is the number of 
indistinguishable configurations in which u has observed destination d\^\ is ():) {jLi), 
and the number of indistinguishable configuration in which u has an unobserved des- 
tination is (^) . Thus, we can express *2 as 



«'2(e,/,j,fc) 



The binomial coefficients largely cancel, and so we can simplify this equation to 

Pd(e + l)(/-j + l) 



*2(e,/,j,fc) 



p:ikif-j + l)+pli{e-k + l) + {e-k + l){f-j + iy 



We observe that j and k are binomially distributed. Therefore, by the Chernoff 
bound, they concentrate around their means as e and / grow. Let ^1 = fb be the mean 
of j and fi2 = eb be the mean of k. We can approximate the tails of the sums over j and 
k in Equation [5] and sum only over the central terms: 



E[^\Xd{u) =d\= 6(1 - &)p2 + b^+ 

(1 ' ^) E (";) (1 - ^)^^'""^ E (7) (1 - ^)'^"^"'- 

O (exp(-2ci)) + O (cxp(-2c2)) + 



(6) 



(&M'2(e, /, j, fc + 1) + (1 - b)^2ie, /, J, fc)) 



As j and fc concentrate around their means, ^'2 will approach its value at those 
means. Let 

£i{j,k,u) = *2(e, /, fc + u) - 1'2(e,/,/^i,/^2 + u) 

be the difference of *2 from its value at ^1 and ^2+u, where u e {0, 1} indicates if us 
output is observed. 



ACM Transactions on Information and System Security, Vol. V, No. N, Article A, Publication date: January YYYY. 



Probabilistic Analysis of Onion Routing in a Black-box Model 



A:19 



is non-increasing in j and is non-decreasing in k: 

D,^2 = 



(l + e)(l + /)(l + e-/c)p«^|P« 



(l + /)(l + e-/c^^,+ 
(l + f-j- u){p]i{e + 1) + (1 - P:i~PlJ{l + e - fc)) 



< 0. 



(1 + / - + + (1 + e - fc - - - p^i^i )) 



> 0. 



Because the signs of these derivatives are constant, the magnitude of ei is largest 
when i and k are as large or as small as possible. We can therefore bound the magni- 
tude of £i with 



/C2e,w 



3iax { £i i Hi + (T\/c^, H2 + a- 
{-1,1} V V 

*2(e, /, Hi + a-\/chf, H2 + o-^/c^e + u) - *2(e, /, Mi; M2 + u] 



n6{0,l} 



max 
cre{-iA} 

u6{0,l} 



where the second line follows from a simple expansion of ^2 according to Eauation l3.2[ 
We use this estimate to approximate the value of ^'2: 

*2(e,/,j, fc + w) = 'i'2{e, f, Hi, H2 + u) + si{j,k,u) 

We set ci = log(/)/4 and C2 = log(e)/4, and then Equation [6] becomes 

El^lXoiu) = d] = 6(1 - b)p^ + 



(1 - ^) E ( ; ) (1 - ^rb-^-' E (7 ) (i - ^)'^"^~'- 



(7) 



6*2(e, /, Hi,H2 + l) + {l- fc)*2(e, /, Mi, ^2)+ 



Oi^\og{f)/f)+OiVMe)/e 



e and / in this expression are binomially distributed. Let ^3 = '^e(l — b) be the mean 
of e and ^4 = J^/ll — fe) be the mean of /. By applying the Chernoff bound to the sum 
over e, setting the tails to start at min(fe, 1 - b)ne/2 from /13, we can see that 



e=0 



/=0 
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We can similarly show that 



5] ( ) (1 - bTb-^-^- J2 ( 7 ) (1 - bVb-^-^O ( Vbi(7)77) = O i ^\og{nf)/nf 

e=0 ^ ^ /=0 \ ^ \ 

For the remaining terms inside both sums, approximate the sums over e and / using 
the Chernoff bound by setting the tails to be those terms more than ^/c^fT^ from ^3 and 
more than ^c^nf from ^t4, respectively This yields 

E[^\Xd{u) = d] = 6(1 - + 

O ({login,) I n,)-^'^) + O ({log{nf) /nf)-^'^) + O (e-^^^) + O (e-^-^) + 

(1-^) E h\i-bw-- E (7)a-^/^--^- 

[b^2{e, /, Ail, + 1) + (1 - &)*2(e, /, /^i, /^2)] ■ 

As e and / concentrate around their means, ^2 will approach its value at those 
means. Let 

£2(6, /, u) = 'i'2(e, /, Ail, Ai2 + w) - '^2[^^■i, f^4, Mi, M2 + u) 

be the difference of *2 from its value at e = /is and / = fii, u e {0, 1}. 5*2 (e, /, ^1,^2) in 
non-decreasing with respect to e: 



(1 + (1 - 6)/)6p:1((/ + 1)(1 - p:^) - fb{l -p^^- pl^^ )) 



(l + (l-5)/)(l + (l-5)e)+ 
(l + (l-6)/)(5e)p2|+ 
&/(l + (1 - b)e + 



2 



> 0. 

*2(e, /, /ii, M2 + 1) is non-increasing with respect to e: 

(1 + (1 - b)J){l - b)p^^{fb{l ~ pl^^ - p^^) - (/ + 1)(1 - f^)) 



A;*2(e,/,Ail,Ai2) 



/((l-6)/)(l + (l-%)+ 
(l + (l-6)/)(6e+l)p^ + 
\bf{{l-b)e)p-, 



< 0. 

*2(e, /, Ml, M2 + w), w G {0, 1}, is non-increasing with respect to /: 

-6(l + e)(l + (l-6)e + K)pM 
'(l + (l-&)/)(l + (l-6)e + u)+^ ' 

&/(! + (1-6)6 + 7.^^1 
< 0. 

Therefore, the magnitude of £2 is largest when e and / are as large or as small as 
possible. We can therefore estimate the magnitude of £2 with 



max 

TG{-1,1} 
mG{0,1} 



(1^2 (m3 + (7^/C3ne, ^4 + CT^CiUf, u)\) 
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If ne,n/ 7^ 0, 

*2(M3,A*4,A'l,Ai2 + It) 

If ne = 0, which occurs when a = 0, 

£2 (0, /X4 + ay/C4,nf, u) = ^2(0, /i4 + (Ty/C4,nf, ni,u) - ^2(0, /X4, /xi, m) 



If = 0, which occurs when a ~ 1, the final term becomes 



£2 (^3 + (Ty^CsUe, 0, w) = ^"2(^3 + CTy^C^, 0, 0, ^^2 + ") - ^'2(Ai3, 0, 0, /i2 + u) 

= O (^y/c^/rCj ■ 

These as ymp totic estimates of £2 follow from a simple expansion of ^'2 according to 
Equation [mi 

We use them estimate to approximate the value of ^2 as e and / grow: 

^•2(6, /, All, AI2 + m) = *2(M3, Ai4, fil,fi2+u)+ £2(6, /, u) 

= *2(/i3,A^4,A*i,A^2 +u) + (^VcsT^) + O (^^J Ci/nf 

We set C3 = log(ne)/4 and C4 = log(n/)/4, and then Equation [8] becomes 

Ei^lXoiu) = d]= 6(1 - b)p'^ + 

(1 - b) [b^2{m, M4, Ml, M2 + 1) + (1 - &)*2(Ai3, M4, Ml, M2)] + (9) 

O ((Zo,gK)/n,)-i/^) + O [{log{nf)/nf)-'/') . 

Finally, we must estimate ^"2(^3, M4, Mi, M2 + u), u e {0,1}. Assume that < a < 1 
and thus that ne — a{n — 1) and n/ = (1 — — 1) are both increasing with n. Then 

*2(M3, M4, Ml, M2 + u) = ^'2((1 - b)ne, (1 - b)nf, 6(1 - b)nf, 6(1 - b)ne + u) 

_ p^(l - bYuerif + cirte + C2n/ + C3 

((1 - 6)4 + p^(l - bfb + (1 - bfb)n,nf+ \ 
C4ne + C5nf + ce J 

+ 0(1K) + 0{l/nf) + 0(l/(?7,en/)), 



l-6 + p^6 + pJJ^^^6 

where ci, . . . , cg are some values constant in n^, and n/. When a = 0, then = 0, and 
the estimate becomes 

^'2(M3,M4,Mi,M2 + u) = ^'2(0, (1 - 6)n/,6(l - b)nf,u) 



((1 - «)(1 - 6) + p>(l - 6) + p« ^1 (1 - u)b)nf + C2 
- ^') 



((1 - ii)(l - 6) + p>(l - 6) + p« , (1 - u)b) 



Oil/nj), 
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where ci, C2 are some values constant in ti/. When a = 1, then n/ = 0, and the estimate 
becomes 

*2(a*3, M4, Ml: M2 + u) = *2((1 - b)ne, 0, 0, 6(1 - b)ne + u) 

" ((l-6)+p^5)n, + C2 

= , fi +0(l/ne), 
1 - b + p'^b 

where ci, C2 are some values constant in Ue. 

Inserting these estimates for 5'2(a*3i M4i Mi) A*2 + u) into Equation |9] yields the theo- 
rem. □ 

It follows from this theorem that the worst case anonymity over user distributions 
occurs either when all users always visit d|De;ta| or when all users always visit d. 

Corollary 3.7. lim„^inf E[ii\XD(u) ~ d] is maximized either at a = or at a = 1. 

Pr oof . The case a = 1 is larger in the limit than the case where < a < 1, by 
Thm. 13. 6[ because 

Pd < Pd 

□ 

The case a = 1 is the worst case only when 

u ^ {i-b)(i-p-,r 

This happens when > 1/2 and Pd^^^ is near 1 — pj^. That is, if the user is likely to 

visit d and the other users can't distinguish themselves too much, then it is worst to 
have them always visit d because the adversary will blame u. 
However, we would expect p'^i^^^ to be small because it is at most 1/|A|. In this case 

the worst-case limiting distribution has a = 0, that is, it is worst when the other 
users always act very different from u by visiting d\^\. Then the expected assigned 
probability is about & + (1 — 6)p^. This is equal to the lower bound on the anonymity 
metric when the adversary controls a fraction Vb of the network. 

4. TYPICAL DISTRIBUTIONS 

It is unlikely that users of onion routing will ever find themselves in the worst-case sit- 
uation. The necessary distributions just do not resemble what we expect user behavior 
to be like in any realistic use of onion routing. Our worst-case analysis may therefore 
be overly pessimistic. To get some insight into the anonymity that a typical user of 
onion routing can expect, we consider a more realistic set of users' destination distri- 
butions in which each user selects a destination from a common Zipfian distribution. 
This model of user behavior is used by Shmatikov and Wang | 2d06| to analyze relation- 
ship anonymity in mix networks and is motivated by observations that the popularity 
of sites on the web follows a Zipfian distribution. 
Let each user select his destination from a common Zipfian distribution p: pd^ = 

l/{pi^), where s > and /i = Yl\=i I/*'*- It turns out that the exact form of the distri- 
bution doesn't matter as much as the fact that it is common among users. 
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Proof. Let p be the common destination distribution. The expected assigned prob- 
abihty can be expressed as: 



E[9\XDiu)^d]^b' +b{l-b)p^l- 



s=l 



t=0 



n- 1 
s - 1 



s - 1 
t- 1 



Ae_D*:Ai=£ii=2 



S - 1 



E n?'A.^4(s,A) 



AeD* i=l 



(10) 



Here, s represents the size of the set of users with unobserved inputs, t represents 
the size of the subset of those s users that also have observed outputs, A represents 
the t observed destinations, and ^'4(5, A) is the posterior probabihty. In this situation, 
5* is unambiguous given s and A. Let Ad = \{x £ A : x ~ d}\. can be expressed 
simply as: 



*4(s,A) 



Arf(g- 1)1^1-1 1)1^1 

s|A| 

{Ad+Pdis-t))/s. 



The sum 



J2 n^^A,*4(s,A) 



AeDt:Ai=di=2 



in Equation[TO]calculates the expectation for ^1*4 conditioned on s and t. The expression 
for ^'4 shows that this expectation depends linearly on the expected value of A^. A^'s 
expectation is simply l+pd(t — I), because one destination in this case is always d, and 
each of the other t - 1 is d with probability pd- The sum 



E npA.*4(s,A) 

AeD' i=i 

in Equation 1101 similar Iv depends linearly on the expectation of A^, which in this case 
is pdt. 

With these observations, it is a straightforward calculation to show that the sum 
over t in Equation 1 101 is simply 



^P_dis-iHi ^ _ ^^^^^ 
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We insert this into Equation [TO] and simplify: 

E[^\Xd{u) = d] =52 + 5(1 - 6K+ 









;)[ 



{i-b)Y,b-~^i-br-'(^_^ 

s—l 

=b^ + 5(1 - b)p^+ 

(1-6)^ V 
=52 + (1 - + 0(l/n) 



/ (l-p,)(l-(l-b)"+i) , , 

^^'^ + K^TTT^ ) + 



□ 



Our results show that the expected value of the anonymity metric is close to 6^ + (1 — 
for large populations. This amount matches the lower bound shown in Thm.[3]3j 

5. CONCLUSIONS AND FUTURE WORK 

We expect each user of an anonymity network to have a pattern of use. In order to 
make guarantees to the user about his anonymity, we need to take this into account 
when modeling and analyzing the system, especially in light of previous research that 
indicates that an adversary can learn these usage patterns given enough time. 

We perform such an analysis on onion routing. Onion routing is a successful design 
used, in the form of the Tor system, by hundreds of thousands of people to protect their 
security and privacy. But, because it was designed to be practical and because theory 
in this area is still relatively young, the formal analysis of its privacy properties has 
been limited. 

We perform our analysis using a simple black-box model in the UC framework. 
We justify this model by showing that it information-t heoretically provides the same 



anonymity as the onion routing protocol formalized by Feigenbaum et al. [2007 1. Fur- 
thermore, it should lend itself to the analysis of other anonymity protocols expressed 
within the UC framework. We investigate the relationship anonymity of users and 
their destinations in this model and measure it using the probability that the adver- 
sary assigns to the correct destination of a given user after observing the network. 

Our anonymity analysis first shows that a simple, standard approximation to the 
expected value of the anonymity metric provides a lower bound on it. Then we consider 
the worst-case set of user behaviors to give an upper bound on the expected value. We 
show that, in the limit as the number of users grows, a user's anonymity is worst 
either when all other users choose destinations he is unlikely to visit, because that 
user becomes unique and identifiable, or when that user chooses a destination that all 
other users prefer, because the adversary mistakes the group's choices for the user's 
choice. This worst-case anonymity with an adversary that controls a fraction b of the 
routers is comparable to the best-case anonymity against an adversary that controls a 
fraction Vb. 

The worst case is unlikely to be the case for any users; so we investigate anonymity 
under a more reasonable model of user behavior suggested in the literature. In it, users 
select destinations from a common Zipfian distribution. Our results show that, in this 
case and in any case with a common distribution, the expected anonymity tends to the 
best possible, i.e. the adversary doesn't usually gain that much knowledge from the 
other users' actions. 

Future work includes extending this analysis to other types of anonymity (such 
as sender anonymity), extending it to other anonymity networks, and learning more 
about the belief distribution of the adversary than just its mean. A big piece of the 
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attack we describe is in learning the users' destination distribution, about which only 
a small amount of research, usually on simple models, has been done. The speed with 
which an adversary can perform this stage of the attack is crucial in determining the 
validity of our attack model and results. 

In response to analyses such as that of 0verlier and Sj^erson 1200611 . the current 
Tor design includes entry guards by default for all circuits. Roughly, this means that, 
since about January 2006, each Tor client selects its first onion router from a small set 
of nodes that it randomly selects at initialization. The rationale is that communication 
patterns of individuals are what need to be protected. If an entry guard is compro- 
mised, then the percentage of compromised circuits from that user is much higher. 
But, without entry guards, it appears that whom that user communicates with and 
even at what rate can be fairly quickly learned by an adversary owning a modest per- 
centage of the Tor nodes anyway. If no entry guard is compromised, then no circuits 
from that user will ever be linked to him. However, if a user expects to be targeted 
by a network adversary that can control nodes, he can expect his entry guards ulti- 
mately to be attacked and possibly compromised. If the destinations he chooses that 
are most sensitive are rarely contacted, he may thus be better off choosing first nodes 
at random. How can we know which is better? Extending our analysis to include entry 
guards will allow us to answer or at least further illuminate this question. 

Our model also assumes that client connections to the network are such that the 
initial onion router in a circuit can tell that it is initial for that circuit. This is true 
for the overwhelming majority of traffic on the Tor network today, because most users 
run clients that are not also onion routers. However, for circuits that are initiated 
at a node that runs an onion router, a first node cannot easily tell whether it is the 
first node or the second — without resorting to other attacks of unknown efficacy, e.g. , 
monitoring latency of traffic moving in each direction in response to traffic moving 
in the other direction. Thus, that initiating edge of the black box is essentially fuzzy. 
Indeed, this was originally the only intended configuration of onion routing for this 
reason [Goldschlag et al. 1996[ . The addition of clients that do not also function as 
routers was a later innovation that w as added to increase usability and flexibility 
I Reed et al. 1998H Syvers on et al. 200 0]. Si milarly, peer-to-peer de signs such as Crowds 
I Reiter an d Rubin 1998 1 and Tar z an [Freed man and Morris 200211 derive their security 
even more strongly from the inability of the first node to know whether it is first or not. 
Thus, extending our model and analysis to this case will make it still more broadly ap- 
plicable. 

A. APPENDIX 

Let / be as defined in Lemma [331 
Lemma A.l. / > o. 

Proof. Let i = Sd. and = v — m for simplicity. Then 

j ^ {v^i^l){v + {l'-i)^l) 

P^v{v + - i + i^) + (1 + i)p'^v{v - i^L + + + ifi){i' - ifJ. + i^/i) 

The second derivative of / can be expressed as 
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where 



{~^HpI -plV{{^ + j){pl +Pl) + P^^)+ 

ii\i+jV{pi +pi +Pifim+3)iPi +vi)+M- 

+ + 3){pl + Pi ) + {-Pi + Pi (1 + + 

{^ + jf {{^ + ]){plf{l + ^J^f + Pi ((* + J)PI + 



Pi + /i)' + Pi (2 + - .? + + (1 + * + ) ) 



+ j){i{pl + /3) + i(p'^. + ipl + jp^^. + /3))Ai + 



substituting [i + j) for z^. D is clearly positive. Therefore we must just show that N is 
non-negative. 

We collect terms in N by the coefficients , Pdj , and (3: 



2pd./3(j + +j- + J + + 

2M,)'(» + .7)'(* + J-m)(» + .7 + »m)'+ 
2{pl)\^+J?i^+J-^^K^+J+J^^f + 
2plpl + + J + + m)- 

(z2 (-1 + + J {f^{2 + ^i) +j{-l + fi^))+i {^i(2 + /i) - J (2 + m'))) . 



The coefficients of the terms in pd^ and pd^ are clearly positive because i + j = v > 

v — m ~ fi. 

If we collect the remaining terms by t and j, we get 



((PdJ' + (pl + A^)' + pIpI (-2 - + 2/i2 + + 
f {{Pir + + m)' + (-2 - M + 2^^ + M^)) + 

3*'j {{Pl (1 + m) + fe", (1 + m)' - (2 + A^)) + 

3*j' {{pini + m) + + m)' - (2 + f^)) . 
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The coefficients for the i"^ and j"^ terms are clearly non-negative when > I. When 
yu = 0, observe that the coefficients become {p^^ — Pd^)"^ ^ 0- I'he coefficients for the i^, 
j"^, and ij terms are also clearly non-negative. 

To show that the i'^j term is non-negative, we use the fact that pd, and pd^ are prob- 
abilities that sum to at most one. hetpdj = C ^Pdi, < C < 1- Then the coefficient of i^j 
becomes a quadratic function of pd, with positive second derivative. Its minimum is at 

_ 4C + 5Cm + 2C/^^ 
~ 2(2 + ij)-^ • 

The coefficient evaluated at this point is 

(8 + 11^ + 4^^) 
4(2+71)2 ' 

which is non-negative. Therefore, the whole i'^j term is non-negative. 

Similarly, for the ij^ term, we look at its coefficient as a function of pd^ with pdj — 
C — Pd,- It is also a quadratic function with positive second derivative. Its minimum is 
found at 

4C + 3Cm 
2(2 + Ai)2' 

The coefficient evaluated at this point is 

CV(8 + At(ll + 4/i)) 
4(2 + Ai)2 

which is non-negative. Therefore, the whole ij^ term is non-negative. This implies that 
N is non-negative, and thus that D'^^ f is non-negative. □ 
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